action context
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- North America > United States (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- North America > United States (0.04)
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction
Pasca, Razvan-George, Gavryushin, Alexey, Kuo, Yen-Ling, Van Gool, Luc, Hilliges, Otmar, Wang, Xi
We study object interaction anticipation in egocentric videos. This task requires an understanding of the spatiotemporal context formed by past actions on objects, coined action context. We propose TransFusion, a multimodal transformer-based architecture. It exploits the representational power of language by summarising the action context. TransFusion leverages pre-trained image captioning and vision-language models to extract the action context from past video frames. This action context together with the next video frame is processed by the multimodal fusion module to forecast the next object interaction. Our model enables more efficient end-to-end learning. The large pre-trained language models add common sense and a generalisation capability. Experiments on Ego4D and EPIC-KITCHENS-100 show the effectiveness of our multimodal fusion model. They also highlight the benefits of using language-based context summaries in a task where vision seems to suffice. Our method outperforms state-of-the-art approaches by 40.4% in relative terms in overall mAP on the Ego4D test set. We validate the effectiveness of TransFusion via experiments on EPIC-KITCHENS-100. Video and code are available at https://eth-ait.github.io/transfusion-proj/.
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Massachusetts (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Long-Horizon Planning and Execution with Functional Object-Oriented Networks
Paulius, David, Agostini, Alejandro, Lee, Dongheui
Following work on joint object-action representations, functional object-oriented networks (FOON) were introduced as a knowledge graph representation for robots. A FOON contains symbolic concepts useful to a robot's understanding of tasks and its environment for object-level planning. Prior to this work, little has been done to show how plans acquired from FOON can be executed by a robot, as the concepts in a FOON are too abstract for execution. We thereby introduce the idea of exploiting object-level knowledge as a FOON for task planning and execution. Our approach automatically transforms FOON into PDDL and leverages off-the-shelf planners, action contexts, and robot skills in a hierarchical planning pipeline to generate executable task plans. We demonstrate our entire approach on long-horizon tasks in CoppeliaSim and show how learned action contexts can be extended to never-before-seen scenarios.
- North America > United States > Rhode Island (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Austria > Tyrol > Innsbruck (0.04)
Context sequence theory: a common explanation for multiple types of learning
Some of these models referred to the principles of brain science including reinforcement learning, visual perception and attention. However, there is still an enormous gap between machine learning models and human learning or mammalian learning. For example, a large number of training samples are not always necessary for mammalian learning in many conditions. And mammalian brain has a strong flexibility and plasticity to support the outstanding transfer learning. Multiple types of learning Over two thousand years ago, Confucius, the Chinese philosopher, proposed many laws of learning such as "It's important for learning to practice from time to time" and "we can gain new insights through reviewing old material"
Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences
Emelin, Denis, Bras, Ronan Le, Hwang, Jena D., Forbes, Maxwell, Choi, Yejin
In social settings, much of human behavior is governed by unspoken rules of conduct. For artificial systems to be fully integrated into social environments, adherence to such norms is a central prerequisite. We investigate whether contemporary NLG models can function as behavioral priors for systems deployed in social settings by generating action hypotheses that achieve predefined goals under moral constraints. Moreover, we examine if models can anticipate likely consequences of (im)moral actions, or explain why certain actions are preferable by generating relevant norms. For this purpose, we introduce 'Moral Stories', a crowd-sourced dataset of structured, branching narratives for the study of grounded, goal-oriented social reasoning. Finally, we propose decoding strategies that effectively combine multiple expert models to significantly improve the quality of generated actions, consequences, and norms compared to strong baselines, e.g. though abductive reasoning.
- North America > United States (0.28)
- Asia > Middle East > Jordan (0.04)